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Abstract 

In  1988,  Leighton,  Maggs,  and  Rao  showed  that  for  any  network  and  any  set  of  packets  whose  paths  through 
the  network  are  hxed  and  edge-simple,  there  exists  a  schedule  for  routing  the  packets  to  their  destinations  in 
0(c-l-  d)  steps  using  constant-size  queues,  where  c  is  the  congestion  of  the  paths  in  the  network,  and  d  is  the 
length  of  the  longest  path.  The  proof,  however,  used  the  Lovasz  Local  Lemma  and  was  not  constructive.  In 
this  paper,  we  show  how  to  hnd  such  a  schedule  in  0[V{\og\ogV)  \ogV)  time,  with  probability  1  —  l/U^,  for 
any  positive  constant  {3,  where  V  is  the  sum  of  the  lengths  of  the  paths  taken  by  the  packets  in  the  network. 
We  also  show  how  to  parallelize  the  algorithm  so  that  it  runs  in  NC .  The  method  that  we  use  to  construct 
the  schedules  is  based  on  the  algorithmic  form  of  the  Lovasz  Local  Lemma  discovered  by  Beck. 
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1  Introduction 


In  this  paper,  we  consider  the  problem  of  scheduling  the  movements  of  packets  whose  paths  through  a 
network  have  already  been  determined.  The  problem  is  formalized  as  follows.  We  are  given  a  network  with 
n  nodes  (switches)  and  m  edges  (channels).  Each  node  can  serve  as  the  source  or  destination  of  an  arbitrary 
number  of  messages.  Each  message  consists  of  an  arbitrary  number  of  packets  (or  cells  or  flits,  as  they  are 
sometimes  referred  to).  Let  N  denote  the  total  number  of  packets  to  be  routed.  (In  a  dynamic  setting,  N 
would  denote  the  rate  at  which  packets  enter  the  network.  Eor  simplicity,  we  will  consider  a  static  scenario 
in  which  a  total  of  N  packets  are  to  be  routed  through  the  network.)  The  goal  is  to  route  the  N  packets 
from  their  origins  to  their  destinations  via  a  series  of  synchronized  time  steps,  where  at  each  step  at  most 
one  packet  can  traverse  each  edge. 

Eigure  1  shows  a  5-node  network  in  which  one  packet  is  to  be  routed  to  each  node.  The  shaded  nodes  in 
the  Rgure  represent  switches,  and  the  edges  between  the  nodes  represent  channels.  A  packet  is  depicted  as 
a  square  box  containing  the  label  of  its  destination. 

During  the  routing,  packets  wait  in  three  different  kinds  of  queues.  Before  the  routing  begins,  packets 
are  stored  at  their  origins  in  special  initial  queues.  When  a  packet  traverses  an  edge,  it  enters  the  edge  queue 
at  the  end  of  that  edge.  A  packet  can  traverse  an  edge  only  if  at  the  beginning  of  the  step,  the  edge  queue 
at  the  end  of  that  edge  is  not  full.  Upon  traversing  the  last  edge  on  its  path,  a  packet  is  removed  from  the 
edge  queue  and  placed  in  a  special  final  queue  at  its  destination.  In  Eigure  1,  all  of  the  packets  reside  in 
initial  queues.  Eor  example,  packets  4  and  5  are  stored  in  the  initial  queue  at  node  1.  In  this  example,  each 
edge  queue  is  empty,  but  has  the  capacity  to  hold  two  packets.  Einal  queues  are  not  shown  in  the  figure. 
Independent  of  the  routing  algorithm  used,  the  size  of  the  initial  and  final  queues  are  determined  by  the 
particular  packet  routing  problem  to  be  solved.  Thus,  any  bound  on  the  maximum  queue  size  required  by  a 
routing  algorithm  refers  only  to  the  edge  queues. 

This  paper  focuses  on  the  problem  of  timing  the  movements  of  the  packets  along  their  paths.  A  schedule 
for  a  set  of  packets  specifies  which  move  and  which  wait  at  each  time  step.  Given  any  underlying  network, 
and  any  selection  of  paths  for  the  packets,  our  goal  is  to  produce  a  schedule  for  the  packets  that  minimizes 
the  total  time  and  the  maximum  queue  size  needed  to  route  all  of  the  packets  to  their  destinations.  We 
would  also  like  to  ensure  that  any  two  packets  traveling  along  the  same  path  to  the  same  destination  always 
proceed  in  order. 

Of  course,  there  is  a  strong  correlation  between  the  time  required  to  route  the  packets  and  the  selection 
of  the  paths.  In  particular,  the  maximum  distance,  d,  traveled  by  any  packet  is  always  a  lower  bound  on 
the  time.  We  call  this  distance  the  dilation  of  the  paths.  Similarly,  the  largest  number  of  packets  that  must 
traverse  a  single  edge  during  the  entire  course  of  the  routing  is  a  lower  bound.  We  call  this  number  the 
congestion,  c,  of  the  paths.  Eigure  2  shows  a  set  of  paths  for  the  packets  of  Eigure  1  with  dilation  3  and 
congestion  3. 

1.1  Previous  and  related  work 

Given  any  set  of  paths  with  congestion  c  and  dilation  d,  in  any  network,  it  is  straightforward  to  route  all 
of  the  packets  to  their  destinations  in  cd  steps  using  queues  of  size  c  at  each  edge.  In  this  case  the  queues 
are  big  enough  that  a  packet  can  never  be  delayed  by  a  full  queue  in  front,  so  each  packet  can  be  delayed  at 
most  c  —  1  steps  at  each  of  at  most  d  edges  on  the  way  to  its  destination. 

In  [9],  Leighton,  Maggs,  and  Rao  showed  that  there  are  much  better  schedules.  In  particular,  they 
established  the  existence  of  a  schedule  using  0(c-l-  d)  steps  and  constant-size  queues  at  every  edge,  thereby 
achieving  the  naive  lower  bounds  for  any  routing  problem.  The  result  is  highly  robust  in  the  sense  that  it 
works  for  any  set  of  edge-simple  paths  and  any  underlying  network.  (A  priori,  it  would  be  easy  to  imagine 
that  there  might  be  some  set  of  paths  on  some  network  that  required  more  than  Sl(c  -f  d)  steps  or  larger 
than  constant-size  queues  to  route  all  the  packets.)  The  method  that  they  used  to  show  the  existence  of 
optimal  schedules,  however,  is  not  constructive.  In  other  words,  the  fastest  known  algorithms  for  producing 
schedules  of  length  0{c  +  d)  with  constant-size  edge  queues  require  time  that  is  exponential  in  the  number 
of  packets. 

Eor  the  class  of  leveled  networks,  Leighton,  Maggs,  Ranade,  and  Rao  [8]  showed  that  there  is  a  simple 
on-line  randomized  algorithm  for  routing  the  packets  to  their  destinations  within  0(c  -\-  L  -\-  log  A)  steps. 
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Figure  1:  A  graph  model  for  packet  routing. 


with  high  probability,  where  L  is  the  number  of  levels  in  the  network,  and  N  is  the  total  number  of  packets. 
(In  a  leveled  network  with  L  levels,  each  node  is  labeled  with  a  level  number  between  0  and  L  —  I,  and  every 
edge  that  has  its  tail  on  level  i  has  its  head  on  level  i  +  1,  for  0<i<L  —  1.) 

Mansour  and  Patt-Shamir  [10]  then  showed  that  if  packets  are  routed  greedily  on  shortest  paths,  then 
all  of  the  packets  reach  their  destinations  within  d  +  N  steps,  where  N  is  the  total  number  of  packets.  These 
schedules  may  be  much  longer  than  optimal,  however,  because  N  may  be  much  larger  than  c. 

Recently  Meyer  auf  der  Heide  and  Vdcking  [11]  devised  a  simple  on-line  randomized  algorithm  that  routes 
all  packets  to  their  destinations  in  0{c  +  d  +  log#)  steps,  with  high  probability,  provided  that  the  paths 
taken  by  the  packets  are  short-cut  free  (e.g.,  shortest  paths). 

1.2  Our  results 

In  this  paper,  we  show  how  to  produce  schedules  of  length  0(c -f  d)  in  0{V{\og\ogV)\ogV)  time,  with 
probability  at  least  1  —  l/T*^,  for  any  constant  /3  >  0,  where  V  is  the  sum  of  the  lengths  of  the  paths 
taken  by  the  packets.  The  schedules  can  also  be  found  in  polylogarithmic  time  on  a  parallel  computer  using 
0{V{\og\ogV)\ogV)  work,  with  probability  at  least  1  —  . 

The  algorithm  for  producing  the  schedules  is  based  on  an  algorithmic  form  of  the  Lovasz  Local  Lemma 
(see  [6]  or  [13,  pp.  57-58])  discovered  by  Beck  [3].  Showing  how  to  modify  Beck’s  arguments  so  that  they  can 
be  applied  to  scheduling  problems  is  the  main  contribution  of  the  paper.  Once  this  is  done,  the  construction 
of  optimal  routing  schedules  is  accomplished  using  the  methods  of  [9] . 

The  result  has  several  applications.  For  example,  if  a  particular  routing  problem  is  to  be  performed  many 
times  over,  then  it  may  be  feasible  to  compute  the  optimal  schedule  once  using  global  control.  This  situation 
arises  in  network  emulation  problems.  Typically,  a  guest  network  G  is  emulated  by  a  host  network  H  by 
embedding  G  into  H .  (For  a  more  complete  discussion  of  emulations  and  embeddings,  see  [7].)  An  embedding 
maps  nodes  of  G  to  nodes  of  H ,  and  edges  of  G  to  paths  in  H .  There  are  three  important  measures  of  an 
embedding:  the  load,  congestion,  and  dilation.  The  load  of  an  embedding  is  the  maximum  number  of  nodes 
of  G  that  are  mapped  to  any  one  node  of  H .  The  congestion  is  the  maximum  number  of  paths  corresponding 
to  edges  of  G  that  use  any  one  edge  of  H .  The  dilation  is  the  length  of  the  longest  path.  Let  I,  c,  and 
d  denote  the  load,  congestion,  and  dilation  of  the  embedding.  Once  G  has  been  embedded  in  H ,  H  can 
emulate  G  in  a  step-by-step  fashion.  Each  node  of  H  hrst  emulates  the  local  computations  performed  by 
the  I  (or  fewer)  nodes  mapped  to  it.  This  takes  0(1)  time.  Then  for  each  packet  sent  along  an  edge  of  G,  H 
sends  a  packet  along  the  corresponding  path  in  the  embedding.  The  algorithm  described  in  this  paper  can 
be  used  to  produce  a  schedule  in  which  the  packets  are  routed  to  their  destinations  in  0(c-l-  d)  steps.  Thus, 
H  can  emulate  each  step  of  G  in  0{l  +  c  +  d)  steps. 

The  result  also  has  applications  to  job-shop  scheduling.  In  particular,  consider  a  scheduling  problem  with 
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Figure  2:  A  set  of  paths  for  the  packets  with  dilation  d  =  3  and  congestion  c  =  3. 


jobs  ji,  .  .  . ,  jr,  and  machines  mi ,  .  .  . ,  nig ,  for  which  each  job  must  be  performed  on  a  specihed  sequence  of 
machines.  In  this  application,  we  assume  that  each  job  occupies  each  machine  that  works  on  it  for  a  unit 
of  time,  and  that  no  machine  has  to  work  on  any  job  more  than  once.  Of  course,  the  jobs  correspond  to 
packets,  and  the  machines  correspond  to  edges  in  the  packet  routing  problem.  Hence,  we  can  dehne  the 
dilation  of  the  scheduling  problem  to  be  the  maximum  number  of  machines  that  must  work  on  any  job,  and 
the  congestion  to  be  the  maximum  number  of  jobs  that  have  to  be  run  on  any  machine.  As  a  consequence  of 
the  packet  routing  result,  we  know  that  any  scheduling  problem  can  be  solved  in  0{c  +  d)  steps.  In  addition, 
we  know  that  there  is  a  schedule  for  which  each  job  waits  at  most  0{c  +  d)  steps  before  it  starts  running, 
and  that  each  job  waits  at  most  a  constant  number  of  steps  in  between  consecutive  machines.  The  queue  of 
jobs  waiting  for  any  machine  will  also  always  be  at  most  a  constant. 

1.3  Outline 

The  remainder  of  the  paper  is  divided  into  sections  as  follows.  In  Section  2,  we  give  a  very  brief  overview  of  the 
non-constructive  proof  in  [9].  Also  we  introduce  some  dehnitions,  and  present  two  important  lemmas  which 
will  be  of  later  use.  In  Section  3,  we  describe  how  to  make  the  non-constructive  method  in  [9]  constructive, 
and  analyze  its  running  time.  In  Section  4,  we  show  how  to  parallelize  the  scheduling  algorithm.  We  conclude 
with  some  remarks  in  Section  5. 


2  Preliminaries 

In  [9],  Leighton,  Maggs,  and  Rao  proved  that  for  any  set  of  packets  whose  paths  are  edge-simple^  and  have 
congestion  c  and  dilation  d,  there  is  a  schedule  of  length  0(c  -f  d)  in  which  at  most  one  packet  traverses 
each  edge  of  the  network  at  each  step,  and  at  most  a  constant  number  of  packets  wait  in  each  queue  at  each 
step.  Note  that  there  are  no  restrictions  on  the  size,  topology,  or  degree  of  the  network  or  on  the  number  of 
packets. 

The  strategy  for  constructing  an  efficient  schedule  is  to  make  a  succession  of  rehnements  to  the  “greedy” 
schedule.  So,  in  which  each  packet  moves  at  every  step  until  it  reaches  its  hnal  destination.  This  initial 
schedule  is  as  short  as  possible;  its  length  is  only  d.  Unfortunately,  as  many  as  c  packets  may  use  an  edge  at 
a  single  time  step  in  So,  whereas  in  the  hnal  schedule  at  most  one  packet  is  allowed  to  use  an  edge  at  each 
step.  Each  rehnement  will  bring  us  closer  to  meeting  this  requirement. 

^An  edge-simple  path  uses  no  edge  more  than  once. 
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The  proof  uses  the  Lovasz  Local  Lemma  ([6]  or  [13,  pp.  57-58])  at  each  rehnement  step.  Given  a  set  of 
“bad”  events  in  a  probability  space,  the  lemma  provides  a  simple  inequality  which,  when  satished,  guarantees 
that  with  probability  greater  than  zero,  no  bad  event  occurs.  The  inequality  relates  the  probability  that 
each  bad  event  occurs  with  the  dependence  among  them.  A  set  of  events  Ai,  A2,  .  .  ■ ,  Am  in  a  probability 
space  has  dependence  at  most  h  if  every  event  is  mutually  independent  of  some  set  of  m  —  &  —  1  other  bad 
events.  The  lemma  is  non-constructive;  for  a  discrete  probability  space,  it  shows  only  that  there  exists  some 
elementary  outcome  that  is  not  in  any  bad  event. 

Lemma  [Lovasz]  Let  Ai,  A2,  .  .  . ,  Am  be  a  set  of  “bad”  events,  each  occurring  with  probability  p  with  depen¬ 
dence  at  most  b.  If  4pb  <  1,  then  with  probability  greater  than  zero,  no  bad  event  occurs.  □ 

Before  proceeding,  we  need  to  introduce  some  notation.  A  T-frame  is  a  sequence  of  T  consecutive  time 

steps.  The  frame  congestion,  Cf,  in  a  T-frame  is  the  largest  number  of  packets  that  traverse  any  edge  in  the 
frame.  The  relative  congestion,  R,  in  a  T-frame  is  the  ratio  Cf  /T  of  the  congestion  in  the  frame  to  the  size 
of  the  frame. 


2.1  A  pair  of  tools  for  later  use 

In  this  section  we  re-state  Lemma  3.5  of  [9]  and  we  prove  Proposition  3.6,  which  replaces  Lemma  3.6  of  [9]. 
Both  will  be  used  in  the  proofs  through  Section  3. 

Lemma  3.5  [9]  In  any  schedule,  if  the  number  of  packets  that  use  a  particular  edge  g  in  any  y-frame  is  at 
most  Ry,  for  all  y  between  T  and  2T  —  1,  then  the  number  of  packets  that  use  g  in  any  y-frame  is  at  most 
Ry,  for  all  y  >  T. 


Proof:  Consider  a  frame  r  of  size  T',  where  T'  >  2T—  1.  The  hrst  ([T'/TJ  —  1)T  steps  of  the  frame  can  be 
broken  into  T-frames.  In  each  of  these  frames,  at  most  RT  packets  use  g.  The  remainder  of  the  T'-frame  r 
consists  of  a  single  j/- frame,  where  T  <  y  <  2T  —  1,  in  which  at  most  Ry  packets  use  g.  □ 

The  following  proposition  will  be  used  in  place  of  Lemma  3.6  of  [9]. 


Proposition  3.6  Suppose  that  there  are  positive  constants  01,02,  «i  >  «2,  and  p,  such  that  in  a  schedule 
of  size  /"I  (or  smaller)  the  relative  congestion  is  at  most  p  in  frames  of  size  or  larger,  and  let  q  be  the 
number  of  edges  traversed  by  the  packets  in  this  schedule.  Furthermore,  suppose  that  each  packet  is  assigned  a 
delay  chosen  randomly,  independently,  and  uniformly  from  the  range  [0,/"^]  and  that  if  a  packet  is  assigned 
a  delay  of  x,  then  x  delays  are  inserted  in  the  first  steps  of  the  schedule  and  —  x  delays  are  inserted 
in  the  last  steps,  where  03  is  also  a  positive  constant.  Then  for  any  constant  >  0,  with  probability  at 

least  1  —  q/ 1^  the  relative  congestion  in  any  frame  of  size  log^  I  or  larger  in-between  the  first  and  last 
steps  in  the  new  schedule  is  at  most  p[l  -\-  cr),  for  some  positive  a  =  0(1) /i/log  I ■ 


Proof:  To  bound  the  relative  congestion  in  frames  of  size  log^  I  or  larger,  we  need  to  consider  all  q  edges 
and,  by  Lemma  3.5,  all  frames  of  size  between  log^  I  and  (21og^  I)  —  1. 

As  we  shall  see,  the  number  of  packets  that  use  an  edge  g  during  a  particular  T-frame  r  has  a  binomial 
distribution.  In  the  new  schedule,  a  packet  can  use  g  during  r  only  if  in  the  original  schedule  it  used  g  during 
r  or  during  one  of  the  steps  before  the  start  of  r.  Since  the  relative  congestion  in  any  frame  of  size 
or  greater  in  the  original  schedule  is  at  most  p,  there  are  at  most  p{I“^  -f  T)  such  packets.  The  probability 
that  an  individual  packet  that  could  use  g  during  r  actually  does  so  is  at  most  T/ 1“^ .  Thus,  the  probability 
p  that  p'  or  more  packets  use  an  edge  g  during  a  particular  T  frame  r  is  at  most 


p  < 


p(/“=-tT) 


E 


p(/“=  +  T) 
k 


rj,  X  p(/“2-FT)-fe 

/“2  j 


To  estimate  the  area  under  the  tails  of  this  binomial  distribution,  we  use  the  following  Chernoff-type 
bound  [5].  Suppose  that  there  are  x  independent  Bernoulli  trials,  each  of  which  is  successful  with  probability 
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p' .  Let  S  denote  the  number  of  successes  in  the  x  trials,  and  let  p  =  E[S]  =  xp' .  Following  Angluin  and 
Valiant  [2],  we  have 


Pr[S'>  (1  +  7)//]  < 

for  0  <  7  <  1. 

In  our  application,  x  =  p{I“^  +T),  p'  =  and  p  =  p(/"^  +T)T//"^.  For  7  =  \ogI  (where 

fco  is  any  positive  constant),  p  >  1,  and  T  >  log^ /,  we  have  Pr[5  >  (1  +  7)p]  <  ‘^+t)tI(i  ^log/)  ^ 

g-feoiog/  ^  g-feoin/  _  Setting  p'T  =  (1  +  7)//  =  (1  +  ^3^77~io^)p(/"2  +  T)T / we  have 

P'  <  P(1  +  ^i/Vlog/),  since  log^ ///"^  <  1/Vlog/,  for  I  large  enough,  for  some  constant  fci  >  0  (that 
depends  on  ko).  Let  cr  =  fci/^log I .  Then  p'  <  p(l  +  cr).  Thus  p  =  Pr[5  >  p'T]  <  Pr[5  >  (1  +  7)p]  <  1/ . 

Since  there  are  at  most  (/"^  +  /"^)  <  2/"^  starting  points  for  a  frame,  and  log^  /  different  size  frames 
starting  at  each  point,  and  there  are  at  most  q  distinct  edges  per  frame,  the  probability  that  the  relative 
congestion  is  more  than  p'  in  any  frame  is  at  most  q2I“^  log^  I / 1^°  <  g//*^o-ai-2  2  log^  I  <  P). 

Setting  fco  =  ■?  +  «i  +  2  completes  the  proof.  □ 


3  An  algorithm  for  constructing  optimal  schedules 

In  this  section,  we  describe  the  key  ideas  required  to  make  the  non-constructive  proof  of  [9]  constructive. 
There  are  many  details  in  that  proof,  but  changes  are  required  only  where  the  Lovasz  Local  Lemma  is  used, 
in  Lemmas  3.2,  3.7  and  3.9  of  [9].  The  non-constructive  proof  showed  that  a  schedule  can  be  modihed  by 
assigning  delays  to  the  packets  in  such  a  way  that  in  the  new  schedule  the  relative  congestion  can  be  bounded 
in  much  smaller  frames  than  in  the  old  schedule.  In  this  paper,  we  show  how  to  hnd  the  assignment  of  delays 
quickly.  We  will  not  regurgitate  the  entire  proof  in  [9],  but  only  reprove  those  lemmas,  trying  to  state  the 
replacement  propositions  in  a  way  as  close  as  possible  to  the  original  lemmas. 

In  Section  3.1,  we  provide  a  proposition.  Proposition  3.2,  that  is  a  constructive  version  of  Lemma  3.2  of  [9]. 
In  Sections  3.2  and  3.3,  we  provide  three  propositions  that  are  meant  to  replace  Lemma  3.7  of  [9].  Lemma  3.7 
is  applied  0(log*(c-l-  d))  times  in  [9].  In  this  paper  we  will  use  Propositions  3.7.1  and  3.7.2  to  replace  the 
hrst  two  applications  of  Lemma  3.7.  The  remaining  applications  will  be  replaced  by  Proposition  3.7.3.  In 
Section  3.4,  we  present  the  three  replacement  propositions  for  Lemma  3.9  of  [9].  Our  belief  is  that  a  reader 
who  understands  the  structure  of  the  proof  in  [9]  and  the  propositions  in  this  paper  can  easily  see  how  to 
make  the  original  proof  constructive.  Finally,  we  analyze  the  running  time  of  our  algorithm  in  Section  3.5. 

3.1  The  first  reduction  in  frame  size 

For  a  given  set  of  N  packets,  let  c  and  d  denote  the  congestion  and  the  dilation  of  the  paths  taken  by  these 
packets,  and  let  V  denote  the  sum  of  the  length  of  these  paths.  Let  m  be  the  number  of  edges  traversed  by 
the  packets  (we  can  ignore  the  edges  not  traversed  by  any  packet  in  the  network).  Note  that  m  <V  <  me. 
The  following  proposition  is  meant  to  replace  Lemma  3.2  of  [9].  It  is  used  just  once  in  the  proof,  to  reduce 
the  frame  size  from  d  to  logT*. 

Proposition  3.2  For  any  constant  /3  >  0,  there  is  a  constant  a  >  0,  such  that  there  exists  an  algorithm  that 
constructs  a  schedule  of  length  d  -\-  etc  in  which  packets  never  wait  in  edge  queues  and  in  which  the  relative 
congestion  in  any  frame  of  size  logP  or  larger  is  at  most  1.  The  algorithm  runs  in  time  at  most  0{V),  and 
succeeds  with  probability  at  least  1  —  l/V^ . 

Proof:  The  algorithm  is  simple:  assign  each  packet  an  initial  delay  that  is  chosen  randomly,  independently, 
and  uniformly  from  the  range  [0,  ac],  where  a  is  a  constant  that  will  be  specihed  later.  The  packet  will  wait 
out  its  initial  delay  and  then  travel  to  its  destination  without  stopping.  The  length  of  the  new  schedule  is 
at  most  ac -\-  d.  Constructing  the  new  schedule  takes  time  at  most  0{V). 

To  bound  the  relative  congestion  in  frames  of  size  logP  or  larger,  we  need  to  consider  all  m  edges  and, 
by  Lemma  3.5,  all  frames  of  size  between  logP  and  21og'P  —  1.  For  any  particular  edge  g,  and  T-frame  r. 
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where  \ogV  <T<  21og'P  —  1,  the  probability,  p,  that  more  than  T  packets  use  ^  in  r  is  at  most 


since  each  of  the  at  most  c  packets  that  pass  through  g  has  probability  at  most  T/ac  of  using  g  in  r. 
(Note  that  e  denotes  the  base  of  the  natural  logarithm.)  The  total  number  of  frames  to  consider  is  at  most 
(ac  +  d)  \ogV ,  since  there  are  at  most  ac  -\-  d  places  for  a  frame  to  start,  and  \ogV  frame  sizes.  Thus  the 
probability  that  the  relative  congestion  is  too  large  for  any  frame  of  size  log  V  or  larger  is  at  most 


mlog'P(ac  +  d) 


e  \  logP 


Using  the  inequalities  V  >  c,  V  >  m,  and  V  >  d,  we  have  that  for  any  constant  /3  >  0,  there  exists  a  constant 
a  >  0,  such  that  this  probability  is  at  most  1/V^ .  D 

Before  applying  Proposition  3.7.1,  we  hrst  apply  Proposition  3.2  to  produce  a  schedule  Si  of  length 
0(c+  d)  in  which  the  relative  congestion  in  any  frame  of  size  logT*  or  larger  is  at  most  1.  For  any  positive 
constant  /5,  this  step  succeeds  with  probability  at  least  1  —  IjV^ .  If  it  fails,  we  simply  try  again. 


3.2  A  randomized  algorithm  to  reduce  the  frame  size 

In  this  section,  we  prove  two  very  similar  propositions.  Propositions  3.7.1  and  3.7.2,  that  are  meant  to  replace 
the  hrst  two  applications  of  Lemma  3.7  of  [9],  which  we  state  below.  Let  /  >  0.  We  break  a  schedule  S  into 
blocks  of  21^  +  21^  —  I  consecutive  time  steps. 

Lemma  3.7  [9]  In  a  block  of  size  21^  +  2P  —  I ,  let  the  relative  congestion  in  any  frame  of  size  I  or  greater 
be  at  most  r,  where  1  <  r  <  I .  Then  there  is  a  way  of  assigning  delays  to  the  packets  so  that  in-between  the 
first  and  the  last  P  steps  of  this  block,  the  relative  congestion  in  any  frame  of  size  =  log^  I  or  greater  is 
at  most  ri  =  r(l  +  ei),  where  ci  =  0{1) / \/\ogI . 

After  applying  Proposition  3.2  to  reduce  the  frame  size  from  d  to  \ogV ,  Proposition  3.7.1  is  used  to 
reduce  the  frame  size  from  logT*  to  (log log and  then  Proposition  3.7.2  is  used  to  reduce  the  frame 
size  from  (log  log to  log^((loglog'P)^)  =  (logloglog'P)‘^(^).  Unlike  Lemma  3.7  of  [9],  Propositions  3.7.1 
and  3.7.2  may  increase  the  relative  congestion  by  a  constant  factor.  In  general,  we  cannot  afford  to  pay  a 
constant  factor  at  each  of  the  0(log*(c  +  d))  applications  of  Lemma  3.7  of  [9],  but  we  can  afford  to  pay  it 
twice.  For  the  application  of  Proposition  3.7.1,  I  =  logT*  and  r  =  1.  With  probability  at  least  1  —  l/T*^, 
for  any  constant  /5  >  0,  we  succeed  in  producing  a  schedule,  S^,  in  which  the  relative  congestion  is  at  most 
0(1)  in  frames  of  size  log^  /  =  (log log T*)^  (if  we  should  fail,  we  simply  try  again).  In  the  application  of 
Proposition  3.7.2,  I  =  (loglogT*)^,  and  r  =  0(1).  In  the  resulting  schedule,  Ns,  the  relative  congestion  is  at 
most  0(1)  in  frames  of  size  log^((loglog'P)^)  =  (logloglog'P)‘^(^),  with  probability  at  least  1  —  l/T*^,  for 
any  constant  /5  >  0.  At  this  point,  we  start  using  Proposition  3.7.3. 

Proposition  3.7.1  Let  the  relative  congestion  in  any  frame  of  size  I  or  greater  be  at  most  r  in  a  block  of 
size  2P  +  2P  —  I,  where  1  <  r  <  I  and  I  =  logP.  Let  Q  be  the  sum  of  the  lengths  of  the  paths  taken  by 
the  packets  within  this  block.  Then  there  is  an  algorithm  for  assigning  initial  delays  in  the  range  [0,7]  to 
the  packets  so  that  in-between  the  first  and  last  P  steps  of  the  block,  the  relative  congestion  in  any  frame  of 
size  log^  I  or  greater  is  at  most  2r' ,  where  r'  =  r(l  +  cr)  and  cr  =  0(l)/v^ogT.  For  any  constant  fi  >  0,  the 
algorithm  runs  in  time  at  most  0(Q(loglogT’)  logP),  with  probability  at  least  1  —  l/V^ . 

Proof:  We  dehne  the  bad  event  for  each  edge  g  in  the  network  and  each  T-frame  r,  where  log^  I  if  T  < 
2  log^  7—1,  as  the  event  that  more  than  r'T  packets  use  ^  in  r.  A  particular  bad  event  may  or  may  not 
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occur,  i.e.  may  or  may  not  be  true,  in  a  given  schedule.  If  no  bad  event  occurs,  then  by  Lemma  3.5,  the 
relative  congestion  in  all  frames  of  size  log^  /  or  greater  will  be  at  most  r' .  Since  there  are  log^  /  different 
frame  sizes  and  there  are  at  most  (2/^  +  —  /)+/=  2/^  +  different  frames  of  any  particular  size, 

the  total  number  of  bad  events  involving  any  one  edge  is  at  most  (2/^  +  2P)  log^  I  <  I^,  for  I  greater  than 
some  large  enough  constant. 

We  now  describe  the  algorithm  for  hnding  the  assignment.  We  process  the  packets  one  at  a  time.  To 
each  packet,  we  assign  a  delay  chosen  randomly,  independently,  and  uniformly  from  0  to  I .  We  then  examine 
every  event  in  which  the  packet  participates.  We  say  that  the  event  for  an  edge  g  and  a  T-frame  r  is 
critical  if  delays  have  been  assigned  to  C  packets  that  could  possibly  use  g  in  r,  and,  of  these,  more  than 
CT / 1  -\-kr[I  +  T)T/(/-\/log  I)  packets  actually  use  g  during  r,  where  k  is  a  positive  constant  (to  be  specihed 
later) .  Intuitively,  the  event  becomes  critical  if  the  number  of  packets  assigned  delays  so  far  that  traverse  edge 
^  in  r  exceeds  the  expected  number  of  such  packets  {CT/ 1)  by  an  excess  term  kr{I  +  T)T/{I^/log  I),  which 
is  the  maximum  hnal  excess  with  respect  to  the  hnal  expected  value  that  we  allow.  Note  that  C  <  r{T  +  /). 
If  a  packet  causes  an  event  to  become  critical,  then  we  set  aside  all  of  the  other  packets  that  could  also  use 
g  during  r,  but  whose  delays  have  not  yet  been  assigned.  Let  P  denote  the  set  of  packets  that  are  assigned 
delays.  We  will  deal  with  the  packets  that  have  been  set  aside  later.  As  we  shall  see,  after  one  pass  of 
assigning  random  delays  to  the  packets,  the  problem  of  scheduling  the  packets  that  have  been  set  aside  is 
broken  into  a  collection  of  much  smaller  subproblems,  with  probability  at  least  1  —  ,  for  any  constant 

/3'  >  0. 

In  a  pass,  we  assign  a  random  delay  to  each  packet,  and  check  whether  the  event  for  edge  g  and  T-frame 
r  becomes  critical.  We  can  do  this  checking  as  we  construct  the  schedule  in  a  per  time  step  fashion.  When 
considering  time  t,  we  compute  the  number  of  packets  that  traverse  each  edge  g  used  during  t  and  then  we 
compute  the  frame  congestion  of  the  T-frames  ending  at  t,  for  T  G  [log^  /,  2  log^  /  —  1].  Note  that  the  frame 
congestion  of  the  T-frame  ending  at  time  t  can  be  computed  by  taking  the  frame  congestion  of  the  T-frame 
ending  at  t  —  1,  subtracting  the  occurrences  of  edges  in  time  t  —  T  and  adding  the  occurrences  of  edges  in 
time  t.  This  can  be  done  in  time  0(Qlog^  I)  <  0(Q(loglog'P)^),  Q  being  the  sum  of  the  number  of  edges 
in  the  paths  taken  by  the  packets  within  the  block.  In  the  remaining  of  this  paper,  we  assume  we  check  for 
the  congestions  of  all  T-frames  of  a  block,  log^  I  <  T  <  2  log^  /,  as  just  described.  If  a  pass  fails  to  reduce 
the  component  size,  we  try  again. 

Since  a  total  of  at  most  r{I  +  T)  packets  that  traverse  edge  g  have  been  assigned  delays,  the  relative 
congestion  for  the  packets  in  r  due  to  the  packets  in  P  is  at  most  [rT{I  -\-  T)(l  -f  k / I) / I^/T  <  r(l  -f 
1/Vlog  /)(!  +  fc/Vlog  I)  <  r[l  +  {2k  -\-  l)/vdogT],  since  T  <  2  log^  I  and  2  log^  1/ 1  <  1/^log  I,  for  I  large 
enough.  Choose  k  so  that  the  relative  congestion  due  to  the  packets  in  T  in  r  is  at  most  r'  =  r(l  -f  a). 

In  order  to  proceed,  we  must  introduce  some  notation.  Let  m'  be  the  number  of  edges  traversed  by 
some  packet  within  the  block.  The  dependence  graph,  G,  is  the  graph  in  which  there  is  a  node  for  each 
bad  event,  and  an  edge  between  two  nodes  if  the  corresponding  events  share  a  packet.  Let  h  denote  the 
degree  of  G.  Whether  or  not  a  bad  event  for  an  edge  g  and  a  time  frame  r  occurs  depends  solely  on  the 
assignment  of  delays  to  the  packets  that  pass  through  g.  Thus,  the  bad  event  for  an  edge  g  and  a  time 
frame  r  and  the  bad  event  for  an  edge  g'  and  a  time  frame  r'  are  dependent  only  if  g  and  g'  share  a  packet. 
Since  at  most  r{2G  +  2P  —  I)  <  rG  packets  pass  through  g,  and  each  of  these  packets  passes  through  at 
most  2G  +  2P  —  I  <  G  other  edges  g' ,  and  there  are  at  most  G  time  frames  r',  for  I  large  enough,  the 
dependence  h  is  at  most  rG^.  For  r  <  I,  we  have  h  <  G^ .  Since  the  packets  use  at  most  m'  different  edges 
in  the  network,  and  for  each  edge  there  are  at  most  G  bad  events,  the  total  number  of  nodes  in  G  is  at 
most  m'G.  We  say  that  a  node  in  G  is  critical  if  the  corresponding  event  is  critical.  We  say  that  a  node 
is  endangered  if  its  event  shares  a  packet  with  an  event  that  is  critical.  Let  Gi  denote  the  subgraph  of  G 
consisting  of  the  critical  and  endangered  nodes  and  the  edges  between  them.  If  a  node  is  not  in  Gi,  then 
all  of  the  packets  that  use  the  corresponding  edge  have  already  been  assigned  a  delay  (and  since  this  event 
is  not  critical,  the  relative  congestion  on  this  edge  in  the  corresponding  time  frame  is  at  most  r'T),  and  the 
bad  event  represented  by  that  node  cannot  occur,  no  matter  how  we  assign  delays  to  the  packets  not  in  P . 
Hence,  from  here  on  we  need  only  consider  the  nodes  in  Gi. 

We  are  going  to  show  that,  with  high  probability,  the  size  of  the  largest  connected  component  U  of  Gi  is 
at  most  G^'^iogV.  Since  different  components  are  not  connected  by  edges  in  Gi,  no  two  components  share 
a  packet.  Also  any  two  events  that  involve  edges  traversed  by  the  same  packet  share  an  edge  in  Gi,  and  so 
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are  in  the  same  connected  component.  Thus  there  exists  a  one-to-one  correspondence  between  components 
of  Gi  and  disjoint  sets  in  a  partition  of  the  packets  not  in  P.  Hence,  we  can  assign  the  delays  to  the  packets 
in  each  component  separately,  and  we  have  reduced  the  size  of  a  largest  component  in  Gi  from  m' to 
/S^log-P. 

The  trick  to  bounding  the  size  of  the  largest  connected  component  U  is  to  observe  that  the  subgraph  of 
critical  nodes  in  U  is  connected  in  the  cube,  Gf,  of  the  graph  Gi,  i.e.,  the  graph  in  which  there  is  an  edge 
between  two  distinct  nodes  u  and  v  if  in  Gi  there  is  a  path  of  length  at  most  3  between  u  and  v.  In  Gf,  the 
critical  nodes  of  U  form  a  connected  subgraph  because  any  path  u,  ei,  62,  63,  v  that  connects  two  critical  or 
endangered  nodes  u  and  v  by  passing  through  three  consecutive  endangered  nodes  ei,  62,  63  can  be  replaced 
by  two  paths  u,  ei,  62,  w  and  w,  62,  63,  v  of  length  three  that  each  pass  through  e2’s  critical  neighbor  w.  Let 
G2  denote  the  subgraph  of  Gf  consisting  only  of  the  critical  nodes  and  the  edges  between  them.  Note  that 
the  degree  of  G2  is  at  most  h^,  and  if  two  nodes  lie  in  the  same  connected  component  in  G2,  then  they  must 
also  lie  in  the  same  connected  component  in  Gf,  and  hence  in  Gi. 

By  a  similar  argument,  any  maximal  independent  set  of  nodes  in  a  connected  component  of  G2  is  con¬ 
nected  in  Gf.  Note  that  if  a  set  of  nodes  is  independent  in  G2,  then  it  must  also  be  independent  in  Gf  and 
in  Gi .  Let  G3  be  the  subgraph  of  Gf  induced  by  the  nodes  in  a  maximal  independent  set  in  G2  (any  such 
maximal  independent  set  in  G2  will  do).  The  nodes  in  G3  form  an  independent  set  of  critical  nodes  in  Gi. 
The  degree  of  G3  is  at  most  . 

Our  goal  now  is  to  show  that  the  number  of  nodes  in  any  connected  component  W  of  G3  is  at  most  \ogV , 
with  probability  1  —  ,  for  any  constant  j3'  >  0.  To  begin,  with  every  connected  component  W  of  G3, 

we  associate  a  spanning  tree  of  W,  Tw  (any  such  tree  will  do).  Note  that,  if  W  and  W'  are  two  distinct 
connected  components  of  G3,  then  Tw  and  Tw'  are  disjoint. 

Now  let  us  enumerate  the  different  trees  of  size  t  in  G3.  To  begin,  a  node  is  chosen  as  the  root.  Since 
there  are  at  most  m' nodes  in  G3,  there  are  at  most  m' possible  roots.  Next,  we  construct  the  tree  as 
we  perform  a  depth-hrst  traversal  of  it.  Nodes  of  the  tree  are  visited  one  at  a  time.  At  each  node  u  in  the 
tree,  either  a  previously  unvisited  neighbor  of  u  is  chosen  as  the  next  node  to  be  visited  (and  added  to  the 
tree),  or  the  parent  of  u  is  chosen  to  be  visited  (at  the  root,  the  only  option  is  to  visit  a  previously  unvisited 
neighbor).  Thus,  at  each  node  there  are  at  most  ways  to  choose  the  next  node.  Since  each  edge  in  the 
tree  is  traversed  once  in  each  direction,  and  there  are  t  —  I  edges,  the  total  number  of  different  trees  with 
any  one  root  is  at  most  (6®)2(t-i)  <  ^18*. 

Any  tree  of  size  t  in  G3  corresponds  to  an  independent  set  of  size  t  in  Gi ,  moreover,  to  an  independent  set 
of  t  critical  nodes  in  Gi .  We  can  bound  the  probability  that  all  of  the  nodes  in  any  particular  independent 
subset  U  of  size  t  in  Gi  are  critical  as  follows.  Let  pc  be  the  probability  that  more  than  M  =  GT / 1  -\-kr[I  -\- 
T)T //-^log  I  packets  use  edge  g  in  r.  Then 


Since  M  =  GT jl  -f  kr{I  +  T)T / I\/\og  I  and  G  <  r{I  -\-  T),  using  the  Chernoff-type  bound  as  in  the  proof 
of  Proposition  3.6,  with  p  =  r(I  +  T)T  / 1 ,  7  =  ^3fco/log/,  log^  I  GT  <  21og^  I  —  for  any  large  enough 
constants  k  and  ko,  we  have  pc  =  Pr[S  >  M]  <  Pr[S  >  (1  +  7)/u]  <  =  e~^°’'T+T)T/i\ogi  ^ 

g-feoiog/  ^  g-feoin/  _  This  holds  for  any  G  <  r{I  +  T)  and  thus  the  probability  that  the  event  for 

g  and  r  becomes  critical  after  G  packets  have  been  assigned  delays  is  at  most  Since  the  nodes  in  U 

are  independent  in  Gi,  the  corresponding  events  are  also  independent.  Hence  the  probability  that  all  of  the 
nodes  in  the  independent  set  are  critical  is  at  most  1//*^°*.  Thus  the  probability  that  there  exists  a  tree  of 
size  t  in  G3  is  at  most  m' j4-{ko-234)t  there  exists  at  most  m' different  trees  of 

size  t  in  G3  and  h  <  I^^).  Since  m'  <  Q  <  V ,  we  can  make  this  probability  less  than  l/V^  ,  for  t  =  logV, 
and  any  constant  [3'  >  0,  by  choosing  ko  to  be  a  sufhciently  large  constant.  Hence,  with  probability  at  least 
1  —  f/T*^  ,  the  size  of  the  largest  spanning  tree  in  G3  will  be  logT*. 

We  can  now  bound  the  size  of  the  largest  connected  component  in  Gi.  Since  the  largest  connected 
component  in  G3  has  at  most  t  nodes,  with  probability  at  least  1  —  f/T*^  ,  and  each  of  these  nodes  may 
have  iT  neighbors  in  G2,  the  largest  connected  component  in  G2  contains  at  most  iTt  nodes,  with  the  same 
probability.  As  we  argued  before,  the  critical  nodes  in  any  connected  component  of  Gi  are  connected  in  Go- 
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Thus,  the  maximum  number  of  critical  nodes  in  any  connected  component  of  Gi  is  at  most  b^t.  Since  each 
of  these  nodes  may  have  as  many  as  h  endangered  neighbors,  the  size  of  the  largest  connected  component  in 
Gi  is  at  most  bH  <  with  high  probability. 

We  still  have  to  hnd  a  schedule  for  the  packets  not  in  P.  We  now  have  a  collection  of  independent 
subproblems  to  solve,  one  for  each  component  in  the  dependence  graph.  We  can  use  Proposition  3.6  to  hnd 
the  initial  delays  for  these  packets.  Since  each  node  in  the  dependence  graph  corresponds  to  an  edge  in 
the  routing  network,  a  component  with  x  nodes  in  the  dependence  graph  corresponds  to  at  most  *,  and 
possibly  fewer,  edges  in  the  routing  network.  After  applying  Proposition  3.6  once  for  some  subproblem 
with  q  <  logV  =  7®^,  since  I  =  logP,  the  relative  congestion  in  frames  of  size  log^  7  or  larger 

in-between  the  hrst  and  last  P  steps  in  the  new  schedule  for  this  subproblem  is  at  most  r' ,  with  probability 
at  least  1  —  q/I^ ,  for  any  constant  ^  >  0.  Hence,  if  we  apply  Proposition  3.6  to  each  subproblem,  then  the 
relative  congestion  of  the  packets  not  in  H,  in  frames  of  size  log^  7  or  larger  in-between  the  hrst  and  last  P 
steps  in  the  new  schedule,  is  at  most  r',  with  probability  at  least  1  —  q/ >  1  —  1/7^“®^  =  1  —  l/(logT’)^“®^, 
since  the  subproblems  are  mutually  independent  and  disjoint. 

If  we  apply  Proposition  3.6  log T*/ (log  log P)  times  to  each  (mutually  disjoint  and  independent)  routing 
subproblem,  then  for  any  constant  ^  >  53  and  V  large  enough,  with  probability  at  least  1  — l/(logT’)(^“®^)  log'P/ (log log "P)  > 
1  —  the  relative  congestion  in  any  frame  of  size  log^  7  or  greater  is  at  most  r'  =  r(l  -f  0(1) /Vlog  7). 

Applying  Proposition  3.6  log T* /(log log P)  times  and  checking  whether  the  resulting  schedule  is  feasible 
takes  time  at  most  0(Q(loglogT’)^  logT’/(loglogT’))  =  0(Q(log logP)  \ogV). 

We  now  have  schedules  for  the  packets  in  P  and  for  the  packets  not  in  P .  Both  have  relative  congestion 
r(l  -f  0(l)/-\/log7),  with  probability  at  least  1  —  ,  for  any  constant  j3'  >  0.  When  we  merge  the 

two  schedules,  the  relative  congestion  may  be  as  large  as  the  sum  of  the  two  relative  congestions,  that  is, 

2r(l  -f  0(l)/-\/log  7),  with  probability  at  least  1  —  for  any  hxed  /3  >  0.  The  running  time  of  the 

algorithm  is  at  most  O ( (log log T* -f  log 7^)2 (log log T*))  =  0{Q{\og\ogV)\ogV) .  □ 

Proposition  3.7.2  Let  the  relative  congestion  in  any  frame  of  size  I  or  greater  be  at  most  r  in  a  block  of 
size  27^  +  —  I,  where  1  <r  <  I  and  I  =  (log  log P)^.  Let  Q  be  the  sum  of  the  lengths  of  the  paths  taken 

by  the  packets  within  the  block.  Then  there  is  an  algorithm  for  assigning  initial  delays  in  the  range  [0,  7]  to 
the  packets  so  that  in-between  the  first  and  last  P  steps  of  the  block,  the  relative  congestion  in  any  frame  of 
size  log^  7  or  greater  is  at  most  2r' ,  where  r'  =  r(l  -f  a)  and  a  =  0(1) /\Ao^.  For  any  constant  ft  >  0,  the 
algorithm  runs  in  time  at  most  Q(logloglogP)‘^(^)  logP,  with  probability  at  least  1  —  l/V^ . 

Proof:  The  proof  of  this  proposition  is  identical  to  the  one  presented  for  the  previous  proposition  (we  let 
7  =  (loglogP)^  in  that  proof),  except  for  the  last  part,  when  we  assign  delays  to  the  packets  that  are  not 
in  P. 

In  this  case,  we  need  to  make  another  pass  through  the  packets  before  applying  Proposition  3.6  logP/(logloglogP)‘^(^) 
times  to  each  component.  In  the  hrst  pass,  we  reduce  the  maximum  component  size  in  Gi  from  m' P  to 
7®^logP,  with  probability  at  least  1  —  l/V^  ,  for  any  constant  fd'  >  0.  In  the  second  pass,  we  reduce 
the  component  size  from  7®^logP  down  to  7®^  log(7®^  logP)  <  7®^,  for  large  enough  7  =  (loglogP)^,  by 
taking  t  =  log(7®^  logP),  and  noting  that  we  now  have  m'  <  7®^logP.  For  any  component,  this  step 
will  succeed  with  probability  at  least  1  —  1/(7®^  log T*)^  ,  for  any  constant  fd'  >  0.  To  make  this  proba¬ 
bility  as  high  as  it  was  in  the  case  7  =  \ogV ,  if  a  pass  fails  for  any  component,  we  simply  try  to  reduce 
the  component  size  again,  up  to  O (log T*/ (log  log T*))  times.  Then  with  probability  at  least  1  —  ,  for 

any  constant  /3'  >  0,  we  have  reduced  the  component  size  to  at  most  7®^,  and  the  overall  time  taken  by 
the  two  passes  was  at  most  0(Q  log^ ((log  log  T*)^)  log T* /(log  log  T*))  <  Q (log  log  log P) log  T*/ (log  log P), 
with  probability  at  least  1  —  2/7^^  .  The  second  pass  adds  some  packets  to  the  set  P.  Let  Pi  denote 
the  number  of  packets  assigned  delays  in  the  i-th  pass.  Then  the  relative  congestion  due  to  these  pack¬ 
ets  will  be  at  most  [{Pi  +  P2)T j I  P  2kr{I  P  T)T / {P/\^)]/T  <  r{I  P  T)/I  P  2kr{I  P  T)/{P/\^)  < 
r[l  P  T/I  P  2k(I  P  T) / (p/log  I)]  <  r(l  P  (Ak  P  l)/\/\og  I),  since  T  <  2  log^  7  and  2  log^  7/7  <  l/i/Xog  I. 

Now  we  apply  Proposition  3.6  logT’/(logloglogT’)‘^(^)  times,  assigning  delays  to  the  packets  not  in  V 
in  time  Q(logloglogT’)‘^(^)  logP,  obtaining  a  feasible  schedule  for  these  packets  with  relative  congestion 
r(l  P  0(l)/i/logI),  with  probability  at  least  1  —  l/V^  ,  for  any  hxed  fd'  >  0. 
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We  have  schedules  for  the  packets  in  P  and  for  the  packets  not  in  P.  Both  have  relative  congestion 
r(l  +  0(l)/Vlogf),  with  probability  at  least  1  —  2/7^^  ,  for  any  constant  j3'  >  0.  The  total  work  performed 
was  at  most  Q(logloglog'P)‘^(^)  \ogV .  When  we  merge  the  two  schedules,  the  resulting  relative  congestion 
may  be  as  large  as  the  sum  of  the  two  relative  congestions,  that  is  2r(l  +  0(l)/-\/log  /),  with  probability  at 
least  1  —  for  any  hxed  /3  >  0.  □ 

3.3  Applying  exhaustive  search 

The  remaining  0(log*(c  +  d))  applications  of  Lemma  3.7  in  [9]  are  replaced  by  applications  of  the  following 
proposition,  which  uses  the  same  technique  as  Propositions  3.7.1  and  3.7.2  except  that  instead  of  using 
Proposition  3.6  for  each  component  of  the  subgraph  induced  by  critical  and  endangered  nodes  in  the  depen¬ 
dence  graph,  it  uses  the  Lovasz  Local  Lemma  and  exhaustive  search  to  hnd  the  settings  of  the  delays  for  the 
packets. 

Proposition  3.7.3  Suppose  we  have  a  block  of  size  2/^  +  —  L  Let  Q  be  the  sum  of  the  lengths  of 

the  paths  taken  by  the  packets  withm  this  block,  and  let  the  relative  congestion  in  any  frame  of  size  I  or 
greater  in  this  block  be  at  most  r,  where  1  <  r  <  I  and  I  <  (logloglog'P)‘^(^).  Then  there  is  some  way 
of  assigning  initial  delays  in  the  range  [0,7]  to  the  packets  so  that  the  relative  congestion  in  any  frame  of 
size  log^  I  or  greater  in-between  the  first  and  last  P  steps  in  the  resulting  schedule  is  at  most  r' ,  where 
r'  =  r(l  -f  (t)  and  a  =  0(l)/v^ogT.  Furthermore,  for  any  constant  /3  >  0,  this  assignment  can  be  found  in 
Q(loglogloglogT’)‘^(^)  logP  time,  with  probability  at  least  1  —  l/V^ . 

Proof:  The  proof  uses  the  Lovasz  Local  Lemma  to  show  that  an  assignment  of  initial  delays  satisfying  the 
conditions  of  the  proposition  exists. 

We  hrst  assign  delays  to  some  packets  by  making  three  passes  through  the  packets  using  the  algorithm  of 
Proposition  3.7.1.  After  the  hrst  pass,  with  probability  at  least  1  —  l/V^  ,  for  any  constant  fi'  >  0,  the  size 
of  the  largest  remaining  subproblem  will  be  7®^  logP.  We  need  to  make  two  more  passes,  reducing  the  size  of 
the  largest  subproblem  hrst  to  7®^  log(7®^  logP),  and  then  to  7®^  log(7®^  log(7®^  logP))  =  (logloglogP)'^^^) 
(for  7  <  (logloglogP)'^^^)).  As  in  the  second  pass  in  the  proof  of  Proposition  3.7.2,  the  two  additional 
passes  are  repeated  O (log P/ (log log P))  and  logP/(logloglogP)‘^(^)  times  resp.,  for  each  component,  if  we 
fail  to  reduce  the  component  size  as  desired.  For  any  constant  /3  >  0,  we  succeed  in  reducing  the  component 
size  to  (log log logP)‘^(^)  in  time  at  most  Q(loglogloglogP)‘^(^)  logP,  with  probability  at  least  1  —  1/P^. 

We  now  use  the  Lovasz  Local  Lemma  to  show  that  there  exists  a  way  of  completing  the  assignment  of 
delays  (i.e.,  to  assign  delays  to  the  packets  not  in  P)  so  that  the  relative  congestion  in  frames  of  size  log^  7 
or  greater  in  this  block  is  at  most  r(l  -f  0{l)/plogI).  We  associate  a  bad  event  with  each  edge  and  each 
time  frame  of  size  log^  7  through  (21og^7)  —  1.  The  bad  event  for  an  edge  g  and  a  particular  T-frame  r 
occurs  when  more  than  Mg  =  (r(7  -f  T)  —  Cg)T / 1  -\-  cr'r{I  -f  T)T / 1  packets  not  in  P  use  edge  g  in  r,  where 
a'  =  0(l)/v^bgT  and  Cg  is  the  number  of  packets  in  P  that  traverse  edge  g  during  r.  As  we  argued  before, 
the  total  number  of  bad  events  involving  any  one  edge  is  at  most  P.  We  show  that  if  each  packet  not  in  P 
is  assigned  a  delay  chosen  randomly,  independently,  and  uniformly  from  the  range  [0,  7],  then  with  non-zero 
probability  no  bad  event  occurs.  In  order  to  apply  the  lemma,  we  must  bound  both  the  dependence  of  the 
bad  events,  and  the  probability  that  any  bad  event  occurs.  The  dependence  b  is  at  most  ,  as  argued  before. 
For  any  edge  g  and  T-frame  r  that  contains  g,  where  log^  7  <  T  <  (2  log^  I)  the  probability  pg  that  more 
than  Mg  packets  not  in  P  use  g  in  r,  can  be  shown  to  be  at  most  1/7^'^,  for  sufhciently  large  a' ,  using  exactly 
the  same  Chernoff-bound  argument  that  was  used  in  Proposition  3.7.1.  Hence,  4max{pg}6  <  4/7  <  1  (for 
7  >  4),  and  by  the  Lovasz  Local  Lemma,  there  is  some  way  of  assigning  delays  to  the  packets  not  in  P  so 
that  no  bad  event  occurs. 

Since  at  most  r(T  -f  7)  packets  pass  through  the  edge  associated  with  any  critical  node,  and  there  are 
at  most  7-1-1  choices  for  the  delay  assigned  to  each  packet,  the  number  of  different  possible  assignments 
for  any  subproblem  containing  (logloglogT)'^^^)  critical  nodes  is  at  most  (7-1-  < 

j2i  (logiogiogP)  (  )  (since  r  <  I  and  T  <  21og^7).  For  7  <  (logloglogT)'^^^)  and  V  larger  that  some 
constant,  this  quantity  is  smaller  than  (logP)'’',  for  any  hxed  constant  7  >  0.  Hence,  we  need  to  try  out  at 
most  log®"  P  possible  delay  assignments. 
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After  the  four  passes  (consider  the  exhaustive  search  we  perform  to  assign  delays  to  the  packets  not  in 
P  as  the  fourth  pass)  the  number  of  packets  that  use  an  edge  g  in  any  T-frame  is  at  most 


i  =  l 


,r{I  +  T)T 


with  probability  at  least  l  —  ,  where  C'i  is  the  number  of  packets  that  could  have  possibly  traversed  edge  g, 

and  that  were  assigned  delays  in  the  i-th  pass.  Note  that  Cg  =  C1+C2+C3.  Since  C1+C2+C3+C4  <  r(/+T), 
the  number  of  packets  that  traverse  any  edge  in  any  T-frame  is  at  most 

r{I  +  T)T  r(/  +  T)T 

- 1 - +  - 1 - ’ 

which  means  that  the  relative  congestion  in  any  T-frame,  where  log^  I  <T  <  2  log^  I,  is  at  most 


V+T) 


(1  +  4,7') 


(l  +  f)  (1+4^0 
0(1) 


1  + 


VlogZ 


=  r(l  +  cr), 


as  claimed,  since  2  log^  4/7  <  l/ydogT,  for  I  large  enough. 

We  can  bound  the  total  time  taken  by  the  algorithm  as  follows.  The  hrst  three  passes  take  time  at  most 
Q(loglogloglog)‘^(^)  logT,  with  probability  at  least  1  —  1/V^ .  After  the  third  pass,  we  solve  subproblems 
containing  (logloglogT)‘^(^)  critical  nodes  exhaustively.  For  each  subproblem,  for  each  of  the  at  most  log"'  V 
possible  assignment  of  delays  to  the  packets  in  the  subproblem,  for  each  T-frame  r  in  the  subproblem,  for 
every  edge  g  in  r,  for  log^  7  <  T  <  2  log^  7,  we  must  check  whether  more  than  Mg  packets  traverse  g  during  r. 
This  takes  time  at  most  0{Q  log^  7  log"'  V)  <  Q(loglog  loglogT)‘^('^)  log"'  V,  for  V  large  enough,  for  any  hxed 
7  >  0.  In  particular,  for  7  =  1  and  V  large  enough,  this  quantity  is  bounded  by  Q(loglogloglogT)‘^('^)  logT. 
Hence  the  overall  time  taken  is  bounded  by  Q(loglogloglogT)‘^('^)  logT. 

□ 


3.4  Moving  the  block  boundaries 

Now  we  present  the  three  replacement  propositions  for  Lemma  3.9  of  [9],  which  bounds  the  relative  congestion 
after  we  move  the  block  boundaries  (see  [9]).  The  three  propositions  that  follow  are  analogous  to  the  three 
replacement  propositions.  Propositions  3.7. 1-3,  for  Lemma  3.7  of  [9].  Suppose  we  have  a  block  of  size 
27^  +  37^,  obtained  after  we  inserted  delays  into  the  schedule  as  described  in  Propositions  3.6  and  3.7. 1-3, 
and  moved  the  block  boundaries  as  described  in  [9].  Each  proposition  refers  to  a  specihc  size  of  7.  Note 
that  in  [9],  the  steps  between  steps  7^  and  7^  +  37^  in  the  block  are  called  the  “fuzzy  region”  of  the  block. 
We  assume  that  the  relative  congestion  in  any  frame  of  size  7  or  greater  in  the  block  is  at  most  r,  where 
1  <  r  <  7. 

Proposition  3.9.1  Let  I  =  logT.  Let  Q  be  the  sum  of  the  lengths  of  the  paths  taken  by  the  packets  withm  the 
block.  Then  there  ts  an  algorithm  for  assigning  delays  in  the  range  [0,  P]  to  the  packets  such  that  in-between 
steps  71og^7  and  P  and  in-between  steps  P  +  37^  and  27^  +  37^  —  71og^7,  the  relative  congestion  in  any 
frame  of  size  log^  7  or  greater  is  at  most  2ri,  where  ri  =  r(l  +  ai)  and  ai  =  0(l)/\Ao^,  and  such  that 
in-between  steps  P  and  P  +  37^,  the  relative  congestion  in  any  frame  of  size  log^  7  or  greater  is  at  most 
2r2,  where  r2  =  r(l  +  cr2)  and  (72  =  0(l)/v4ogT.  For  any  constant  ft  >  0,  this  algorithm  runs  in  time  at 
most  0(Q(loglogT)  logT),  with  probability  at  least  1  —  l/V^ . 


Proposition  3.9.2  Let  I  =  (loglogT)^.  Let  Q  be  the  sum  of  the  lengths  of  the  paths  taken  by  the  packets 
within  the  block.  Then  there  is  an  algorithm  for  assigning  delays  in  the  range  [0,  P]  to  the  packets  such  that 
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m-between  steps  I  log^  I  and  and  m-between  steps  +  3/^  and  2/^  +  3/^  —  I  log^  I ,  the  relative  congestion 
in  any  frame  of  size  log^  I  or  greater  is  at  most  2ri,  where  ri  =  r(l  +  ai)  and  ai  =  0(l)/\Ao^,  and  such 
that  in-between  steps  P  and  P  +  3/^,  the  relative  congestion  in  any  frame  of  size  log^  I  or  greater  is  at  most 
2r2,  where  r2  =  r(l  +  (T2)  and  (72  =  0(l)/v^x3gT,  For  any  constant  ft  >  0,  this  algorithm  runs  in  time  at 
most  Q(logloglog'P)‘^(^)  logP,  with  probability  at  least  1  —  l/V^ . 

Proposition  3.9.3  Let  I  <  (logloglog'P)‘^(^),  Let  Q  be  the  sum  of  the  lengths  of  the  paths  taken  by  the 
packets  within  the  block.  Then  there  is  an  algorithm  for  assigning  delays  in  the  range  [0,/^]  to  the  packets 
such  that  in-between  steps  Ilog^  I  and  P  and  in-between  steps  P  +  3P  and  2P  +  3P  —  Ilog^  I,  the  relative 
congestion  in  any  frame  of  size  log^  I  or  greater  is  at  most  ri  =  r(l  +  (Ti),  where  ai  =  0(l)/v^ogT,  and 
such  that  in-between  steps  P  and  P  +  “iP ,  the  relative  congestion  in  any  frame  of  size  log^  I  or  greater  is 
at  most  r2  =  r(l  +  (T2),  where  (72  =  0(l)/v^IogT,  For  any  constant  /3  >  0,  this  algorithm  runs  in  time  at 
most  Q(loglogloglog'P)‘^(^)  logP,  with  probability  at  least  1  —  l/V^ . 

3.5  Running  time 

Theorem  3.1  For  any  constant  ft  >  0,  the  overall  running  time  of  the  algorithm  is  at  most  0{V{\og\ogV)  \ogV), 
with  probability  at  least  1  —  l/V^ . 

Proof:  Note  that  each  of  the  propositions  in  the  previous  sections  dealt  only  with  a  single  block.  For 
any  I ,  partitioning  the  schedule  into  disjoint  blocks  and  moving  the  block  boundaries  as  described  in  [9] 
take  0{V)  time.  Let  nj  be  the  number  of  blocks  in  the  schedule  for  the  given  L  Assume  the  blocks 
are  numbered  from  1  to  nj.  Note  that  Yl’i=i  may  be  as  large  as  V  and  Yl’i=i  Qi  —  P ^  where  Qi 
and  mi  are  the  sum  of  the  path  lengths  and  the  number  of  edges  used  within  block  i,  resp..  Let  (d  be 
any  positive  constant.  For  each  partition  of  the  schedule  for  a  given  I  <  (logloglog'P)‘^(^),  we  apply 
Propositions  3.7.3  and  3.9.3  for  each  block  i  in  this  partition,  1  <  *  <  nj,  taking  overall  time  at  most 
■p  (log  log  log  log  P) log  P.  Since  we  will  repartition  the  schedule  0(log*(c+  d))  times  after  we  bring  I 
down  to  (logloglogP)‘^(^),  the  overall  running  time  due  to  applications  of  Propositions  3.7.3  and  3.9.3  is 
at  most  P(loglogloglogP)‘^(^)(logP)  log*(c  +  d).  Thus  the  total  running  time  of  the  algorithm  is  at  most 
0(P)  +  [O(loglogP)  +  (logloglogP)‘^(^)  +  (loglogloglogP)‘^(^)  \og*  {cFd)]V  logP  <  0(P(loglogP)  logP), 
for  P  large  enough,  with  probability  at  least  1  —  1/P^  (since  each  of  the  propositions  is  successful  with 
probability  at  least  1  —  1/P^  ,  for  any  positive  constant  fd').  Note  that  we  used  the  inequalities  P  >  c  and 
V>d.  □ 

4  A  parallel  scheduling  algorithm 

At  hrst  glance,  it  seems  as  though  the  algorithm  that  was  described  in  Section  3  is  inherently  sequential. 
This  is  because  the  decision  concerning  whether  or  not  to  assign  a  delay  to  a  packet  is  made  sequentially. 

In  particular,  a  packet  is  deferred  (i.e.,  not  assigned  a  delay)  if  and  only  if  it  might  be  involved  in  an  event 
that  became  critical  because  of  the  delays  assigned  to  prior  packets. 

In  [1],  Alon  describes  a  parallel  version  of  Beck’s  algorithm  which  proceeds  by  assigning  values  to  all 
random  variables  (in  this  case  delays  to  all  packets)  in  parallel,  and  then  unassigning  values  to  those  variables 
that  are  involved  in  bad  events.  The  Alon  approach  does  not  work  in  this  application  because  we  cannot 
afford  the  constant  factor  blow-up  in  relative  congestion  that  would  result  from  this  process. 

Rather,  we  develop  an  alternative  method  for  parallelizing  the  algorithm.  The  key  idea  is  to  process  the 
packets  in  a  random  order.  At  each  step,  all  packets  that  do  not  share  an  edge  with  an  as-yet-unprocessed 
packet  of  higher  priority  are  processed  in  parallel. 

To  analyze  the  parallel  running  time  of  this  algorithm,  we  hrst  make  a  dependency  graph  G'  with  a  node 
for  every  packet  and  an  edge  between  two  nodes  if  the  corresponding  packets  can  be  involved  in  the  same 
event.  Each  edge  is  directed  towards  the  node  corresponding  to  the  packet  of  lesser  priority.  By  Brent’s 
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Theorem  [4],  the  parallel  running  time  of  the  algorithm  is  then  at  most  twice  the  length  of  the  longest 
directed  path  in  G' . 

Let  D  denote  the  maximum  degree  of  G' .  There  are  at  most  N paths  of  length  L  in  G' .  The 
probability  that  any  particular  path  of  length  L  has  all  of  its  edges  directed  in  the  same  way  is  at  most  2/ L\ 
(the  factor  of  2  appears  because  there  are  two  possible  orientations  for  the  edges).  Hence,  with  probability 
near  1,  the  longest  directed  path  length  in  G'  is  0{D  +  log#).  This  is  because  if  L  >  k{D  +  log#),  for 
some  large  constant  k,  then  N 

Each  packet  can  be  involved  in  at  most  r(2/^  +  P)  log^  I  events,  and  at  most  r(I  +  T)  <  0(1)  packets 
can  be  involved  in  the  same  event.  Hence,  the  degree  D  of  G'  is  at  most  0(P  log^  I).  By  using  the  method 
of  Proposition  3.2  as  a  preprocessing  phase,  we  can  assume  that  c,  d,  and  thus  /,  are  all  polylogarithmic  in 
log#.  Hence,  the  parallel  algorithm  runs  in  NG,  as  claimed. 

5  Remarks 

The  algorithms  described  in  this  paper  are  randomized,  but  they  can  be  derandomized  using  the  method  of 
conditional  probabilities  [12,  13]. 
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